118 research outputs found

    Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models

    Full text link
    The interpretation of complex high-dimensional data typically requires the use of dimensionality reduction techniques to extract explanatory low-dimensional representations. However, in many real-world problems these representations may not be sufficient to aid interpretation on their own, and it would be desirable to interpret the model in terms of the original features themselves. Our goal is to characterise how feature-level variation depends on latent low-dimensional representations, external covariates, and non-linear interactions between the two. In this paper, we propose to achieve this through a structured kernel decomposition in a hybrid Gaussian Process model which we call the Covariate Gaussian Process Latent Variable Model (c-GPLVM). We demonstrate the utility of our model on simulated examples and applications in disease progression modelling from high-dimensional gene expression data in the presence of additional phenotypes. In each setting we show how the c-GPLVM can extract low-dimensional structures from high-dimensional data sets whilst allowing a breakdown of feature-level variability that is not present in other commonly used dimensionality reduction approaches

    Uncovering pseudotemporal trajectories with covariates from single cell and bulk expression data

    Get PDF
    Pseudotime algorithms can be employed to extract latent temporal information from cross-sectional data sets allowing dynamic biological processes to be studied in situations where the collection of time series data is challenging or prohibitive. Computational techniques have arisen from single-cell 'omics and cancer modelling where pseudotime can be used to learn about cellular differentiation or tumour progression. However, methods to date typically implicitly assume homogeneous genetic, phenotypic or environmental backgrounds, which becomes limiting as data sets grow in size and complexity. We describe a novel statistical framework that learns how pseudotime trajectories can be modulated through covariates that encode such factors. We apply this model to both single-cell and bulk gene expression data sets and show that the approach can recover known and novel covariate-pseudotime interaction effects. This hybrid regression-latent variable model framework extends pseudotemporal modelling from its most prevalent area of single cell genomics to wider applications

    Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R.

    Get PDF
    MOTIVATION: Single-cell RNA sequencing (scRNA-seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre-processing, quality control (QC) and normalization. RESULTS: We have developed the R/Bioconductor package scater to facilitate rigorous pre-processing, quality control, normalization and visualization of scRNA-seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high-quality expression dataset ready for downstream analysis. scater provides a rich suite of plotting tools for single-cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development. AVAILABILITY AND IMPLEMENTATION: The open-source code, along with installation instructions, vignettes and case studies, is available through Bioconductor at http://bioconductor.org/packages/scater . CONTACT: [email protected]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    A descriptive marker gene approach to single-cell pseudotime inference

    Get PDF
    MotivationPseudotime estimation from single-cell gene expression data allows the recovery of temporal information from otherwise static profiles of individual cells. Conventional pseudotime inference methods emphasize an unsupervised transcriptome-wide approach and use retrospective analysis to evaluate the behaviour of individual genes. However, the resulting trajectories can only be understood in terms of abstract geometric structures and not in terms of interpretable models of gene behaviour.ResultsHere we introduce an orthogonal Bayesian approach termed ‘Ouija’ that learns pseudotimes from a small set of marker genes that might ordinarily be used to retrospectively confirm the accuracy of unsupervised pseudotime algorithms. Crucially, we model these genes in terms of switch-like or transient behaviour along the trajectory, allowing us to understand why the pseudotimes have been inferred and learn informative parameters about the behaviour of each gene. Since each gene is associated with a switch or peak time the genes are effectively ordered along with the cells, allowing each part of the trajectory to be understood in terms of the behaviour of certain genes. We demonstrate that this small panel of marker genes can recover pseudotimes that are consistent with those obtained using the entire transcriptome. Furthermore, we show that our method can detect differences in the regulation timings between two genes and identify ‘metastable’ states—discrete cell types along the continuous trajectories—that recapitulate known cell types.Availability and implementationAn open source implementation is available as an R package at http://www.github.com/kieranrcampbell/ouija and as a Python/TensorFlow package at http://www.github.com/kieranrcampbell/ouijaflow.Supplementary informationSupplementary data are available at Bioinformatics online.</p

    Single-cell sequencing of iPSC-Dopamine neurons reconstructs disease progression and identifies HDAC4 as a regulator of Parkinson cell phenotypes

    Get PDF
    Induced pluripotent stem cell (iPSC)-derived dopamine neurons provide an opportunity to model Parkinson’s disease (PD), but neuronal cultures are confounded by asynchronous and heterogeneous appearance of disease phenotypes in vitro. Using high-resolution, single-cell transcriptomic analyses of iPSC-derived dopamine neurons carrying the GBA-N370S PD risk variant, we identified a progressive axis of gene expression variation leading to endoplasmic reticulum stress. Pseudotime analysis of genes differentially expressed (DE) along this axis identified the transcriptional repressor histone deacetylase 4 (HDAC4) as an upstream regulator of disease progression. HDAC4 was mislocalized to the nucleus in PD iPSC-derived dopamine neurons and repressed genes early in the disease axis, leading to late deficits in protein homeostasis. Treatment of iPSC-derived dopamine neurons with HDAC4-modulating compounds upregulated genes early in the DE axis and corrected PD-related cellular phenotypes. Our study demonstrates how single-cell transcriptomics can exploit cellular heterogeneity to reveal disease mechanisms and identify therapeutic targets

    Using Ontario's "Telehealth" health telephone helpline as an early-warning system: a study protocol

    Get PDF
    BACKGROUND: The science of syndromic surveillance is still very much in its infancy. While a number of syndromic surveillance systems are being evaluated in the US, very few have had success thus far in predicting an infectious disease event. Furthermore, to date, the majority of syndromic surveillance systems have been based primarily in emergency department settings, with varying levels of enhancement from other data sources. While research has been done on the value of telephone helplines on health care use and patient satisfaction, very few projects have looked at using a telephone helpline as a source of data for syndromic surveillance, and none have been attempted in Canada. The notable exception to this statement has been in the UK where research using the national NHS Direct system as a syndromic surveillance tool has been conducted. METHODS/DESIGN: The purpose of our proposed study is to evaluate the effectiveness of Ontario's telephone nursing helpline system as a real-time syndromic surveillance system, and how its implementation, if successful, would have an impact on outbreak event detection in Ontario. Using data collected retrospectively, all "reasons for call" and assigned algorithms will be linked to a syndrome category. Using different analytic methods, normal thresholds for the different syndromes will be ascertained. This will allow for the evaluation of the system's sensitivity, specificity and positive predictive value. The next step will include the prospective monitoring of syndromic activity, both temporally and spatially. DISCUSSION: As this is a study protocol, there are currently no results to report. However, this study has been granted ethical approval, and is now being implemented. It is our hope that this syndromic surveillance system will display high sensitivity and specificity in detecting true outbreaks within Ontario, before they are detected by conventional surveillance systems. Future results will be published in peer-reviewed journals so as to contribute to the growing body of evidence on syndromic surveillance, while also providing an non US-centric perspective

    Irish cardiac society - Proceedings of annual general meeting held 20th & 21st November 1992 in Dublin Castle

    Get PDF
    corecore